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In this paper, we introduce a family of robust estimates for the 
parametric and nonparametric components under a generalized par- 
tially linear model, where the data are modeled by J/i|(xj,tj) ~ F(-,/j,i) 
with ^ — H(n(ti) +xjp), for some known distribution function F 
and link function H. It is shown that the estimates of j3 are root-n 
consistent and asymptotically normal. Through a Monte Carlo study, 
the performance of these estimators is compared with that of the clas- 
sical ones. 

1. Introduction. Semiparametric models contain both a parametric and 
a nonparametric component. Sometimes, the nonparametric component plays 
the role of a nuisance parameter. Much research has been done on estima- 
tors of the parametric component in a general framework, aiming to obtain 
asymptotically efficient estimators. The aim of this paper is to consider semi- 
parametric versions of the generalized linear models where the response y is 
to be predicted by covariates (x,t), where x6P and tsTcl. It will be 
assumed that the conditional distribution of y | (x, t) belongs to the canonical 
exponential family exp[y#(x,t) — B(9(x,t)) + C(y)] for known functions B 
and C. Then /i(x,t) = E(y|(x,i)) = B'(9(x,t)), with B' denoting the deriva- 
tive of B. In generalized linear models [19], which constitute a popular ap- 
proach for modeling a wide variety of data, it is often assumed that the 
mean is modeled linearly through a known inverse link function, g, that is, 

s(/i(x,i)) = A, + x T /3 + at. 
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For instance, an ordinary logistic regression model assumes that the ob- 
servations (yi,Xj,tj) are such that the response variables are independent 
binomial variables 2/j|(xj,ij) ~ Bi(l,pi), whose success probabilities depend 
on the explanatory variables through the relation g(pi) = (3q + xJ/3 + ati, 
with g(u) = ln(u/(l — u)). 

The influence function of the classical estimates based on the quasi- 
likelihood is unbounded. Large deviations of the response from its mean, 
as measured by the Pearson residuals, or outlying points in the covariate 
space, can have a large influence on the estimators. Those outliers or poten- 
tial outliers for the generalized linear regression model are to be detected 
and controlled by robust procedures such as those considered by Stefanski, 
Carroll and Ruppert [23], Kiinsch, Stefanski and Carroll [17], Bianco and 
Yohai [5] and Cantoni and Ronchetti [9]. 

In some applications, the linear model is insufficient to explain the re- 
lationship between the response variable and its associated covariates. To 
avoid the curse of dimensionality, we allow most predictors to be modeled 
linearly while a small number of predictors (possibly just one) enter the 
model nonparametrically. The relationship will be given by the semipara- 
metric generalized partially linear model 

(1) Mx,t) = i?(r?(t)+x T /3), 

where H = g~ l is a known link function, (3 £ M p is an unknown parameter 
and r\ is an unknown continuous function. 

Severini and Wong [22] introduced the concept of generalized profile likeli- 
hood, which was later applied to this model by Severini and Staniswalis [21]. 
In this method, the nonparametric component is viewed as a function of 
the parametric component and -y/n-consistent estimates for the parametric 
component can be obtained when the usual optimal rate for the smoothing 
parameter is used. Such estimates do not deal with outlying observations. 
In a semiparametric setting, outliers can have a devastating effect, since the 
extreme points can easily affect the scale and shape of the function esti- 
mate of rj, leading to possible wrong conclusions concerning (3. The basic 
ideas from robust smoothing and robust regression estimation have been 
adapted to partially linear regression models where H(t) = t; we refer to 
[3, 13, 15]. A robust generalized estimating equations approach for general- 
ized partially linear models with clustered data, using regression splines and 
Pearson residuals, is given in [14]. 

In Section 2 of the present paper, we introduce a two-step robust pro- 
cedure for estimating the parameter (3 and the function r/ under the gen- 
eralized partially linear model (1). In Section 3, we give conditions under 
which the proposed method will lead to strongly consistent estimators and 
in Section 4, we derive the asymptotic distribution of those estimators. In 
Section 5, simulation studies are carried out to assess the robustness and 
efficiency of the proposals. The proofs are deferred to the Appendix. 



ROBUST SEMIPARAMETRIC REGRESSION 



3 



2. The proposal. 

2.1. The estimators. Let (yj,Xj,ij) be independent observations such 
that yi\(xi,ti) ~ F(;fii), with m = H(rj(ti) + xJ/3) and VAR(?/j| (xj, i*)) = 
V(//j). Let ?7o(t) and /3 denote the true parameter values and Eo the ex- 
pected value under the true model, so that Eo(y|(x, i)) = H(rjo(t) +x T /3 ). 
Letting p(y,u) be a loss function to be specified in the next subsection, we 
define 

n 

(2) S n (a,f3,t) = Y,W l (t)p(y uX Jf3 + a)w 1 (x l ), 

i=i 

(3) S(a,[3,T) = E [p(y,x T [3 + a)w 1 (x)\t = T}, 

where Wi(t) are the kernel (or nearest-neighbor with kernel) weights on ti 
and w\(-) is a function that downweights high leverage points in the x space. 
Note that S n (a,(3,r) is an estimate of S(a,f3,r), which is a continuous func- 
tion of (a,/3,r) if (y,x)\t = t has a distribution function that is continuous 
with respect to r. 

Fisher consistency states that i]o(t) = argmin a S(a, /3 , t). This is a key 
point in order to get asymptotically unbiased estimates for the nonparamet- 
ric component. In many situations, a stronger condition holds, that is, under 
general conditions, it can be verified that 

(4) S(r] (t),(3 ,t)<S(a,f3,t) W/3 , a^ m (t), 

which entails Fisher consistency. 

Following the ideas of Severini and Staniswalis [21], we define the function 
T}f}(t) as the minimizer of S(a,j3,t) that will be estimated by the minimizer 
f)p(t) of S n (a,f3,t). 

To provide an estimate of f3 with root-n convergence rate, we denote 

n 

(5) F n (p) = n- 1 p(yi,XiP + f)fi(ti))w 2 (xi), 

(6) FQ3) = Eob(y,x T /3 + Vf3 (t))w 2 (x)}, 

where u>2(-) plays the same role (and can be taken to be the same) as ^i(-)- 
We will assume that j3 is the unique minimizer of F(f3). This assumption is 
a standard condition in M-estimation in order to get consistent estimators 
of the parametric component and is analogous to condition (A-4) of [16], 
page 129. 

A two-step robust proposal is now given as follows: 
• Step 1: For each value of t and (3, let 



(7) 



fjp(t) =argmin5 ri (a,/3,t). 

aeR 
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• Step 2: Define the estimate of (3 Q as 

(8) (3 = argminF n (/3) 

and the estimate of rjo(t) as f)h{t). 



2.2. Loss function p. We propose two classes of loss functions. The first 
aims to bound the deviances, while the second, introduced by Cantoni and 
Ronchetti [9], bounds the Pearson residuals. 

The first class of loss functions takes the form 



where 4> is a bounded nondecreasing function with continuous derivative <p 
and f(-,s) is the density of the distribution function F(-,s) with y|(x,t) ~ 
F(-, H(r]o(t) + x T /3 )). To avoid triviality, we also assume that cj) is noncon- 
stant in a positive probability set. Typically, eft is a function which behaves 
like the identity function in a neighborhood of 0. The function A{y) is typ- 
ically used to remove a term from the log-likelihood that is independent of 
the parameter and can be defined as A(y) = ln(/(y,y)) in order to obtain 
the deviance. The correction term G is used to guarantee Fisher consistency 
and satisfies 



where E s indicates expectation taken under y ~ F{-,s) and f'(y, s) is short- 
hand for df(y,s)/ds. With this class of p functions, we call the resulting 
estimator a modified likelihood estimator. 

In a logistic regression setting, Bianco and Yohai [5] considered the score 
function 



while Croux and Haesbroeck [12] proposed using the score function 



\-2(l + 1 /t)exp(- v / t) + (2(l + \/c)+c)exp(-^/c) otherwise. 

Both score functions can be used in the general setting. Explicit forms of the 
correction term G{s) for the binomial and Poisson families are given in [1]. It 
is worth noting that when considering the deviance and a continuous family 
of distributions with strongly unimodal density function, the correction term 
G can be avoided, as discussed in [4]. 



(9) 



p(y,u) = 4[-]nf(y,H(u)) + A(y)} + G(H(u)) 



G'(s) = J rt-lnf(y,s) + A(y)]f'(y,s)dp(y) 

= E s (<p[- ln/(y, s) + A(y)}f'(y, s)/f(y, s)) 





if t<c 
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The second class of loss functions is based on [9] , wherein the authors con- 
sider a general class of M-estimators of Mallows type by separately bound- 
ing the influence of deviations on y and (x, t). Their approach is based 
on robustifying the quasi- likelihood, which is an alternative to the gener- 
alizations given for generalized linear regression models by Stefanski, Car- 
roll and Ruppert [23] and Kiinsch, Stefanski and Carroll [17]. Let r(y,fj,) = 
(y — /x)y~ 1 / 2 (/i) be the Pearson residuals with VAR,(yj| (xj, tj)) = V(fii). De- 
note v{y,fj) = V~ 1 / 2 (/j,)'i/j c (r(y, //)), where ^ c is an odd nondecreasing score 
function with tuning constant c, such as the Huber function and 



where so is such that u(y, sq) = and the correction term (included to ensure 
Fisher consistency), also denoted G(s), satisfies G'(s) = — E s (u(y, s)). With 
such a p function, we call the resulting estimator a robust quasi-likelihood 
estimator. For the binomial and Poisson families, explicit forms of the cor- 
rection term G(s) are given in [9]. 

2.3. General comments. 

(a) Fisher consistency and uniqueness. Under a logistic partially linear 
regression model, if 



and if we consider the loss function given by (9) with <j> satisfying the regu- 
larity conditions given in [5], it is easy to see that (4) holds and that Fisher 
consistency for the nonparametric component is attained under this model. 
Moreover, it is easy to verify that j3 is the unique minimizer of F((3) in 
this case. The same assertion can be verified for the robust quasi-likelihood 
proposal if ip c is bounded and increasing. 

Under a generalized partially linear model with the response having a 
gamma distribution with a fixed shape parameter, Theorem 1 of Bianco, 
Garcia Ben and Yohai [4] allows us to verify (4) and Fisher consistency for 
the nonparametric and parametric components if the score function eft is 
bounded and strictly increasing on the set where it is not constant and if 
(11) holds. 

For any generalized partially linear model, conditions similar to those 
considered in [9] will lead to the desired uniqueness implied by (4). Note 
that this condition is quite similar to Condition (E) of [21], page 511. When 
considering the classical quasi-likelihood, the assumption j3 Q = argmin^ F((3) 
is related to Condition (7.e) of [21], page 511, but for the robust quasi- 
likelihood, this assumption is fulfilled, for instance, for a gamma family 
with a fixed shape parameter such that (11) holds and ip c is bounded and 
increasing. 



(10) 
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(b) Differentiated equations. If the function p(y, u) is continuously dif- 
ferentiable and we denote ^f(y,u) = (dp(y,u))/du, the estimates will be so- 
lutions to the differentiated equations. More precisely, r]g(t) and rWi) will 
be solutions to S 1 (a,P,t) = and S^(a,P,t) = 0, respectively, with 

(12) S^a, (3, t) = E(*(y, x T /3 + o)i«i(x)|t = r), 

n 

(13) Si(a, P,t) = Y, Wi{t)^{ yi ,xll3 + a) Wl ( Xi ). 

i=l 

Furthermore, is a solution of F^((3) = and Fisher consistency implies 
that F 1 (/3 Q ) = and S 1 (rj (t),fi ,t) = 0, where 



(14) F 1 (/3) = ^^(y,x T /3 + ^(t)K(x 

n 

(15) F*((3) = n- 1 Y / *^x T iP + Vf3(U))w2(xi) 



Note that these first order equations may have multiple solutions and, there- 
fore, we may need the values of the objective functions (2) and (5) to select 
the final estimator. For a family of distributions with positive and finite infor- 
mation number, Bianco and Boente [1] give conditions that entail the follow- 
ing: for each t, there exists a neighborhood of r]o(t) where S 1 (r)o(t), (3 , t) = 
and S 1 (a,(3 ,t) / for a / i]o(t). Moreover, r]o(t) corresponds to a local 
minimum of S(a,(3 ,t). The asymptotic results in this paper are derived by 
assuming the existence of a unique minimum; otherwise, one can only ensure 
that there exists a solution to the estimating equations that is consistent. 

In the modified likelihood approach, the derivative of (9) is given by 
V(y,u) =H'(u)[S>i(y,H(u)) +G'(H(u))], where 

tfifo, u) = <p[- ln/(y, H{u)) + A(y)} [-f'(y, H(u))/f(y, H(u))] . 

On the other hand, for the proposal based on the robust quasi-likelihood, 
we have the following expression for the derivative of (10): 

*(y,u) = -[u(y,H(u)) + G'(H(u))]H'(u) 

= -[Mr(y,H(u)))V~ 1 / 2 (H(u)) + G'(H(u))]H'(u) 

= -iMr(y,H( u ))) - EH^MyMuMW'^v-^mu)). 

One advantage of solving 5^(a,/3,t) = and F^(/3) = is to avoid the 
numerical integration involved in the loss function (10), but the uniqueness 
of the solutions might be difficult to guarantee in general, except for those 
cases discussed in part (a) of this section. Also, note that when using the 
score function of Croux and Haesbroeck [12], the function G(s) in (9) has 
an explicit expression which does not require any numerical integration. 
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(c) Some robustness issues. It is clear that for unbounded response vari- 
ables y, a bounded score function allows us to deal with large residuals. For 
models with a bounded response, for example, under a logistic model, the 
advantage of a bounded score function is mainly to guard against outliers 
with large Pearson residuals. If a binary response y is contaminated, the 
Pearson residuals are large only when the variances at the contaminated 
points are close to 0. These points are made more specific in the simulation 
study in Section 5. 

It is also worth noting that our robust procedures are effective only if at 
least one nonconstant covariate x is present. To consider a case without any 
covariate, we may take yi ~ Bi(l,p) as a random sample. Then easy calcula- 
tions show that the minimizer a of S n (a) = n _1 J27=i PilJii a ) equals the clas- 
sical estimator, that is, a = H (J2i=i Vi/ n ) with H(u) = 1/(1 + exp(— it)), 
when using either the score function proposed in [5] or that given by Can- 
toni and Ronchetti [9]. The same situation obtains if yi\ti ~ Bi(l,p(ti)), 
where the resulting estimate of p(t) will be the local mean. In the present 
paper, with a semiparametric model where the covariate x plays a role, both 
downweighting the leverage points and controlling outlying responses work 
toward robustness. 

3. Consistency. We will assume that t G T and let Tq C T be a com- 
pact set. For any continuous function v : T — > M, we will denote ||i>||oo = 
sup t6 rK*)l and II « II o,oo = sup ieTo \v(t)\. 

In this section, we will show that the estimates defined by means of (7) 
and (8) are consistent under mild conditions, when the smoother weights are 
the kernel weights Wi(t) = (£% =1 K((t - tj)/h n ))~ 1 K{{t - U)/K)- Analo- 
gous results can be obtained for the weights based on nearest neighbors 
using arguments similar to those considered in [6] . In this paper, we will use 
the following set of assumptions: 

CI. The function p(y, a) is continuous and bounded and the functions *&(y, a) = 

dp(y,a) /da, w\(.) and i»2(-) are bounded. 
C2. The kernel K : ]R — > K is an even, nonnegative, continuous and bounded 

function, satisfying / K(u)du = 1, ju 2 K(u)du < oo and — ► 

as \u\ — > oo. 

C3. The bandwidth sequence h n is such that h n — > and nh n /\og{n) — > oo. 
C4. The marginal density fx of t is a bounded function and given any 

compact set TqCT, there exists a positive constant A\(Tq) such that 

A 1 (T )<fr{t) for ante T . 
C5. The function S(a,f3,t) satisfies the following equicontinuity condition: 

for any e > 0, there exists some 5 > such that for any t±,t2 G?o and 

/3 1 ,f3 2 €fC, a compact set in HP, 

\t\ - t 2 \ < S and \\(3 1 - (3 2 \\ < S =>• sup \S(a,/3 1 ,ti) - S(a,(3 2 ,t 2 )\ < e. 
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C6. The function S(a,(3,t) is continuous and rjpfy) is a continuous function 
of {13, t). 

Remark 3.1. If the conditional distribution of x|t = r is continuous 
with respect to r, the continuity and boundness of p stated in CI entail 
that S(a,(3,r) is continuous. 

Assumption C3 ensures that for each fixed a and (3, we have convergence 
of the kernel estimates to their mean, while C5 guarantees that the bias 
term converges to 0. 

Assumption C4 is a standard condition in semiparametric models. In the 
classical case, it corresponds to condition (D) of [21], page 511. It is also 
considered in nonparametric regression when the uniform consistency results 
on the i-space are needed; it allows us to deal with the denominator in the 
definition of the kernel weights, which is, in fact, an estimate of the marginal 
density f T . 

Assumption C5 is fulfilled under CI if the following equicontinuity con- 
dition holds: for any e > 0, there exist compact sets JC\ C R and JC P C W 
such that for any r £ Tq, P((y,x) E K,\ X JC p \t = r) > 1 — e, which holds, 
for instance, if, for 1 < i < n and 1 < j < p, Xy = (f>j(U) + Uij, where 4>j are 
continuous functions and Uy are i.i.d and independent of £j. 

Theorem 3.1. LetJC c MP and To cT be compact sets such thatTs C T, 
where T$ is the closure of a 5 -neighborhood of Tq. Assume that C1-C6 and 
the following conditions hold: 

(i) K is of bounded variation; 

(ii) the family of functions T = {/(y,x) = p(y,x T /3 + a)wi(x), (3 £ JC, 
a G M} has covering number N(e,J-,L 1 (Q)) < Ae~ w , for any probability Q 
and < e < 1 . 

Then we have 

(a) sup,a eK \\S n (a,/3,-) - S(a } /3,-)\\ 0tOO -^>0; 

(b) i/inf jaeK [lim| a |^ 0O S , (a,/3,t) - S(r)p(t), (3,t)] >0, then 

ter 

SUP 1 1 7)/3 - 5?/3||o,oo ^0. 
/3G/C 

Theorem 3.2. Let (3 be the minimizer of F n ({3), where F n (f3) is defined 
as in (5), with fjp satisfying 

(16) sup ll^-Ty^llcoo ^0 

for any compact set K, in BP . If CI holds, then 
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(a) suPf3e!C \F n (l3)-F((3)\^>0; 

(b) if, in addition, there exists a compact set JC\ such that 
liirim^oo P(C\ n >m £ JC\) = 1 and F((3) has a unique minimum at f3 , then 

Remark 3.2. Theorems 3.1 and 3.2 entail that \\fja — ??o||o,oo —> 0, since 
rja(t) is continuous. For the covering number used in Condition (ii) of The- 
orem 3.1, see [20]. 



4. Asymptotic normality. From now on, T is assumed to be a compact 
set. A set of assumptions denoted N1-N6, under which the resulting esti- 
mates are asymptotically normally distributed, are detailed in the Appendix. 



Theorem 4.1. Assume that the ti's are random variables with distri- 
bution on a compact set T and that N1-N6 hold. Then for any consistent 
solution (3 of (15), we have 

vH3 " A)) N(0, A- 1 S(A- 1 ) T ), 
where A is defined in N3 and X is defined in N4. 



Remark 4.1. Theorem 4.1 can be used to construct a Wald-type statis- 
tic to make inferences involving only a subset of the regression parameter, 
that is, when we want to test Ho : /3( 2 ) = 0, with /3 T = (/3m , PJ2) ) • 

Likelihood ratio-type tests can also be used based on the robust quasi- 
likelihood introduced in Section 2, as was done for generalized linear models 
by Cantoni and Ronchetti [9], or on the robustified deviance. A robust mea- 
sure of discrepancy between the two models is defined as 



A = 2 



p(vu + % {U ))™2(xi)-^2p(yi, 3 + f)p o (U ))w 2 (x< 



where (3 = (0t 1 \,O T ) is the estimate of (3 under the null hypothesis. Both 

estimates (3 and (5 need to be computed using the same loss function p 
considered in A, in order to ensure that A will behave asymptotically as 
a linear combination of independent chi-square random variables with one 
degree of freedom. As in [9], it can be seen that A = nLF ^ 2 ^A22.iU n ( 2 ) + 

o p (l), with A22.1 = A 2 2 - A 2 iA n 1 A12 and y^U n N(0, A' 1 S(A' 1 ) T ). 
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5. Monte Carlo study. A small-scale simulation study was carried out 
to assess the performance of the robust estimators considered in this paper. 
A one-dimensional covariate x and a nonparametric function r](t) were con- 
sidered. The modified likelihood estimator (MOD) used the score function 
of Croux and Haesbroeck [12] with c = 0.5. With this choice, the function 
G(s) has an explicit expression, so no numerical integration is necessary. 
The weight functions take the form 

wi(xi) = w 2 2 ( Xi ) = {1 + ( Xi - M n fy\ 

where M n = Median{xj : j = 1, . . . ,re} is the sample median. 

The two competitors considered in the study were the quasi-likelihood 
estimator (QAL) of Severeni and Staniswalis [21] and the robust quasi- 
likelihood estimator (RQL) of Cantoni and Ronchetti [9]. For the latter, 
the Huber function ip c { x ) = max{— 1.2, min(1.2, x)} was used with the same 
weight functions as above. The QAL estimator corresponds to ip c (x) = x and 
wi(x) = W2(x) = 1. In all cases, the kernel K(t) = max{0, 1 — \t\} was used. 
In Studies 1 and 3 below, the search for /3 uses a grid of size 0.05, while in 
Study 2 the grid size is 0.01. 

An important issue in any smoothing procedure is the choice of the 
smoothing parameter. Under a nonparametric regression model with f3 = 
and H (t) =t, two commonly used approaches are cross-validation and plug- 
in. However, these procedures may not be robust; their sensitivity to anoma- 
lous data was discussed by several authors, including [7, 10, 18, 24]. Wang 
and Scott [24] note that in the presence of outliers, the least squares cross- 
validation function is nearly constant on its whole domain and, thus, es- 
sentially worthless for the purpose of choosing a bandwidth. The robustness 
issue remains for the estimators considered in this paper. With a small band- 
width, a small number of outliers with similar values of t{ could easily drive 
the estimate of rj to dangerous levels. Therefore, we may consider a robust 
cross-validation approach as follows: 

• Select at random a subset of size 100(1 — a)%. Let I\- a denote the indexes 
of these observations and J\- a the indexes of the remaining ones. 

• For each given h, compute 

% {t, h) = argmin V Wj(t, h)p(yi, xj(3 + a)wi(xj), 

/3 ( 01 (h) = argmin V p(yi,xJ/3 + f)t~ a \ti, h))w 2 (xi) , 
^ Rp ieri-„ 

where Wi(t, h) = {£™ =1 K((t - tj) /h)}- 1 K((t - U)/h). 

• Choose 

/i n = argmin ^ p(yi,^J0 { a \h) + 4^1) (U, h))w 2 (Xi)- 
h ieJi- a P 
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When the sample size n is small, the leave-one-out cross-validation, which 
is similar to the approach considered here, is usually preferred. When n is 
modestly large, the u-fold cross-validation is often used. However, both of 
them are computationally expensive. Based on our experience with a number 
of data sets, including some from Study 1 below, we found that the approach 
considered here is helpful. A full evaluation of this approach has not yet been 
completed. 

To measure performance through simulation, we use the bias and standard 
deviation for the (3 estimate as well as the mean square error of the function 
estimate 

MSE(f,)=n- 1 jrir 1 (t l )-r ] (t l )] 2 . 
i=l 

We report the comparisons in three scenarios as follows. 

Study 1. Random samples of size n = 100 were generated from the model 

*~W({0.1,0.2,...,1.0}), y\(x,t) ~ Bi(10,p(x,t)), 

where log(p(x,i)/(l — p(x,t))) = 3x + e 2t — 4. We summarized the results 
over 100 runs in Table 1, using three different bandwidths, h n = 0.1, h n = 0.2 
and h n = 0.3. The three estimates are labeled as QAL(/i n ), RQL(/i n ) and 
MOD(/i„). Figure 1 gives the histograms of the estimates of (3 for each 
method and bandwidth. It is clear that the robust estimators RQL and MOD 
have similar performance and that the relative efficiencies of the MOD(/i„) 
are between 0.69 and 0.80, as compared to QAL (h n ). The MOD method 
tends to have smaller bias than the RQL method and even than the QAL 
method. The normality of (5 appeared to hold up quite well at this sample 
size. 



Table 1 
Summary results for Study 1 





Bias(/3) 


SD(/3) 


MSE(/3) 


MSE(t)) 


QAL(O.l) 


0.059 


0.219 


0.051 


0.111 


QAL(0.2) 


0.033 


0.214 


0.047 


0.073 


QAL(0.3) 


0.004 


0.220 


0.048 


0.152 


RQL(O.l) 


-0.051 


0.242 


0.061 


0.114 


RQL(0.2) 


-0.054 


0.254 


0.067 


0.089 


RQL(0.3) 


-0.105 


0.262 


0.080 


0.154 


MOD(O.l) 


0.030 


0.252 


0.064 


0.143 


MOD(0.2) 


0.018 


0.251 


0.063 


0.088 


MOD(0.3) 


-0.001 


0.252 


0.064 


0.135 
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2.0 2.5 3.0 3.5 4.0 2.0 2.5 3.0 3.5 4.0 2.0 2.5 3.0 3.5 4.0 

RQL(0.1) RQL(0.2) RQL(0.3) 




2.0 2.5 3.0 3.5 4.0 2.0 2.5 3.0 3.5 4.0 2.0 2.5 3.0 3.5 4.0 

MOD(0.1) MOD(0.2) MOD(0.3) 

Fig. 1. Histograms of for QAL, RQL and MOD using bandwidths h„ =0.1, 0.2 and 
0.3. 



We also applied the data-adaptive method described in this section for 
choosing h n based on a split of the sample into a training set (80% of the 
data) and a validation set (20%). On a total of ten random samples for 
Study 1, the resulting h n are mostly between 0.1 and 0.2. From Table 1, we 
may observe that h n = 0.2 is indeed a good choice, but the performance of 
j3 is not very sensitive to the choice of h n . 

Study 2. To see how the robust estimators protect us from gross errors 
in the data, we generated a data set of size n = 100 from the model 

z~JV(0,l), *~JV(l/2,l/6), y\(x,t) ~ Bi(10,p(x,t)), 

where \og(p(x,t)/(l — p(x,t))) = 2x + 0.2. We then replaced the first one, 
two and three observations by gross outliers. Table 2 gives the parameter 
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Table 2 

Estimates of /3 (true value of 2) in Study 2. 
(xi,Ui), 1 < i < 3, denote the three contaminating 
points which replace the first three observations 
one by one 





QAL 


RQL 


MOD 


Original data 


2.02 


2.08 


1.99 


xi = 10, yi = 


0.90 


2.07 


2.00 


X2 = -10,y 2 = 10 


0.31 


2.06 


1.97 


a; 3 = -10,y 3 = 10 


0.12 


2.05 


1.95 



Table 3 
Summary results for Study 3 



Data 


Estimator 


Bias(/3) 


SD(/3) 


MSE(/3) 


MSE(j)) 


Original 


QAL 


0.126 


0.357 


0.143 


0.297 


Original 


RQL 


0.199 


0.409 


0.207 


0.348 


Original 


MOD 


0.158 


0.386 


0.174 


0.317 


Contaminated Ci 


QAL 


-0.393 


0.366 


0.288 


0.378 


Contaminated Ci 


RQL 


-0.171 


0.440 


0.223 


0.378 


Contaminated Ci 


MOD 


-0.245 


0.414 


0.231 


0.365 


Contaminated C2 


QAL 


-0.935 


0.287 


0.957 


0.446 


Contaminated C2 


RQL 


0.018 


0.545 


0.297 


0.399 


Contaminated C2 


MOD 


-0.237 


0.436 


0.246 


0.350 


Contaminated C3 


QAL 


-2.187 


0.071 


4.788 


0.402 


Contaminated C3 


RQL 


0.177 


0.430 


0.216 


0.400 


Contaminated C3 


MOD 


-0.037 


0.475 


0.227 


0.369 



estimates under the contaminated data, with h n = 0.1, where (xi,yi), 1 < 
i < 3, denote the outliers. It is clear that the QAL estimate of (5 was very 
sensitive to a single outlier, whereas the robust estimators remained stable. 

Study 3. We considered data sets of size n = 200 which are generated 
from a bivariate normal distribution (xi,ti) ~ N((0, 1/2), S), truncated to 
t E [1/4,3/4], with 

/ 1 V(6>/3)\ 
\l/(QV3) 1/36 J' 

The response variable was then generated as 

fl, /3 Xi + r) (ti) + e.i > 0, 
[0, (5 Xi + 770 (ij) + Ei < 0, 
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where /3q = 2, rjo(t) = 2sin(47ri) and £j was a standard logistic variate. For 
each data set generated from this model, we also created three contami- 
nated data sets, denoted C±, C2 and C3 in Table 3. The purpose of the first 
two contaminations is to see how the robust methods work when one has 
contamination in y only. 

• Contamination 1. The contaminated data points were generated as fol- 
lows: Ui ~ U(Q, 1), Xi = Xi and 



Vi 



Vi if ^ < 0.90, 

a new observation from Bi(l,0.5) if ui > 0.90. 



Contamination 2. For each generated data set, we chose ten "design 
points" with H(f3 Xi + rjoiU)) > 0.99, where H(u) = 1/(1 + exp(— u)), so 
at those points, the conditional mean of y given the covariates is not close 
to 0.5. We then contaminate y as in Contamination 1, but only at those 
ten points. Of those ten points, about half are expected to be outliers with 
large Pearson residuals. 

Contamination 3. Here, we considered a contamination with bad lever- 
age points by using m ~ U(0, 1), 

f Xi if Ui < 0.90, 

1 a new observation from iV(10, 1) if Uj > 0.90, 

' Vi if Ui < 0.90, 

a new observation from £>i(l,0.05) if Uj > 0.90. 



Vi 



Both the original and the contaminated data sets were analyzed using 
the three competing estimators. Using a bandwidth of h n = 0.1, we sum- 
marized the results in Table 3 based on 100 Monte Carlo samples. The 
bandwidth was chosen to be smaller than that used in Study 1 because we 
have 200 distinct observed values of t here, as compared to ten in the earlier 
study. Table 3 shows the poor performance of the classical estimates of /3, 
especially under contamination C3. Under C\, most contaminated y do not 
result in large Pearson residuals and the robust estimators RQL and MOD 
can improve the nonrobust estimator somewhat, but not as significantly as 
under C2 and C3. With respect to the estimation of rj, all procedures seem 
to be stable because the magnitude of outlying y is very limited in this 
case. 



Our studies show the good performance of the two families of robust esti- 
mators considered here in the presence of outliers. The MOD method often 
shows smaller bias for estimating (3, but its mean square error is usually 
similar to that of RQL. 
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APPENDIX 
A.l. Proof of the consistency results. 

Proof of Theorem 3.1. (a) Let Zi(a,/3) = p(yi,xJ/3 + a)u>i(xj), 

n 

Rm(a, (3, t) = (n^)" 1 £ Z,(a,(3)K({t - U)/h n ), 



i=i 
n 



R 0n {t) = (n/ in )- 1 ^^((i - ti)/h n ). 



i=l 



Then S n (a,/3,t) = R\ n (a, f3,t)/ Ro n (t), which implies that 
sup \\S n (a,f3, •) -S(a,(3, -)|| oo 

aGM 



< 



sup ||i?i n (a,/3, •) - E(R ln (a,P, -))|| , 
eeic 

aGM 



+ sup ^(^(a, (3, •)) - 5(a, A -).E0Ron(0)llo oo 

0G/C 
aGR 

+ ||p||oo||'^l||cc||-Ron ~~ E(Ro n )\\o )00 { inf i?0n(*)} , 

where HpH^ = sup (2/j0 ) |p(y, a)| and \\wi\\oo = sup x |wi(x)|. 

Since E(Ro n (t)) = J K(u)fT(t — uh n )du > Ai(T$), it is enough to show 
that 

(A.l) sup \\R ln (a,0, •) - E(R ln (a,f3, -))llo,oo ^ 0, 

/3G/C 
aGM 

(A. 2) \\Ron — E(R( jn )\\o^ 00 — — > 0, 

(A.3) sup\\E(R ln (a,f3,-)) - S(a,(3,-)E(R 0n (-))\\ 0) oo -> 0. 

/3G/C 
aGR 

Assumptions C2-C4 imply (A. 2); see [20], page 35. On the other hand, 
(A.3) follows easily from the boundness of p, the integrability of the ker- 
nel, the equicontinuity condition C5 and the fact that h n — > 0. In order to 
prove (A.l), let us consider the class of functions 

Fn = {ft,a,(3,h n (y,x,v) = B~ 1 p(y,x T p + a)wi{-x)K t ^ hn (v)}, 

with B= ||p||oo H^i ||oo Halloo an d Kt,h n (v) = K ((i — v )/h n ). Using the fact 
that the graphs of translated kernels K% h n have polynomial discrimina- 
tion, inequality < K t ^ n < ||if||oo and assumption (ii), we obtain that 
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N(e,J 7 n ,L 1 (Q)) < Aie~ Wl for any probability Q and < e < 1, where A 1 
and W\ do not depend on n. Since for any ft : a,/3,h n £ Fni \ft,a,p,h\ < 1 and 
E Ula^hSv^ v )) < M^II^II/tIU Theorem 37 in [20] and C4 'imply that 



sup 



n 



,a,/3,h n 



o. 



which concludes the proof of (A.l). 

(b) The continuity of rjp(t) implies that r]p(t) is bounded for t G Tq and 
(3 G K, and, thus, that there exists a compact set A(Tq,K,) such that rjp(t) G 
A(To,lC) for any t G To and (3 £ IC. Assume that supg g £ — ^Ho.oo does 
not converge to in a set f^o with P(0,q) > 0. Then for each u G Oo, we have 
that there exists a sequence (P k ,t k ) such that G 7o> Pk ^ ^ an d V/3 k (t k ) ~ 
Vf3 k (tk) — > c 0. Since 7o and /C are compact, without loss of generality 
we can assume that t k —>tL G Tq and /3 fc — > f3 L G /C and hence obtain that 
ripkih) Vp L {tL), implying that jfo, - r)p L {t L ) -> c. When c < oo, the 
same steps as those used in Lemma Al of [11] lead to a contradiction. If 
c = oo, we have that fjp k (t k ) — ► oo. By assumption, we have that 



0<i 



inf 



lim S(a,/3,t)-S(rn3(t),l3,t) 



and so lim^i^^ S{a,f3 L ,tL) — S(r]p L (tL), (3 L ,tL) > i. Thus, for k sufficiently 
large, S(fjp k (t k ), P L ,t L ) > S(r]p L (t L ), P L ,t L ) + i/2. The equicontinuity con- 
dition implies that given e > 0, for k sufficiently large, S(rjp L (tL), P k ^k) < 
S(Vf3 L {t L ),P L , t L ) + e/A and S(rjf3 h (t k ),P L ,t L ) < S(fjp k (t k ),0 k ,t k ) +e/4, which 
from (a) and the definition of r}^, implies that S(fjp k (t k ),P L ,tL) < 
S n (vp k (tk),P k ,t k ) + e/ 2 < S n (r]p L (t L ),l3 k ,t k ) + e/2. Again using (a), we 
obtain S(fjp k (t k ), P L ,t L ) < S n (7]p L (t L ),/3 k ,t k ) + e/2 < S(r]p L (t L ), P k ,t k ) + 
3e/4 < S(rjp L (tL),P L ,tL) + s. Hence, for sufficiently large, S(r)p (tL,), 
P L ,t L ) + i/2 < S(fjp k (t k ),p L ,t L ) < S(r)p L (t L ),P L ,t L ) +£, leading to a con- 
tradiction. □ 



The next proposition states a general uniform convergence result which 
will be helpful in proving Theorems 3.2 and 4.1. 

We will begin by fixing some notation. Denote by C X (T) the set of contin- 
uously differentiable functions in T. Note that if S 1 (a,P,r) defined in (12) 
is continuously differentiable with respect to (a, r), then rjp G C l (T). V{P) 
and Tts(P) denote neighborhoods of P G K. and rjp such that V(/3) C K, and 

H S {P) = [u G C\T) : \\u - t/0 | !«, < 5, 



d 







-u 



dt dt 



VP 



<5 
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Proposition A.l. Let (yj,Xi,ij) be independent observations such that 
yi\(xi,ti) ~F(-,tii), with }j,i = H(r) (ti)+xJ(3 ) andVAR(yi\(xi,U)) =V(jh). 
Assume that ti are random variables with distribution on T . Let g : M? — ► R 
be a continuous and bounded function, W(x, i) : R p+1 — > R 6e suc/t i/iai 
.E7(| W(x, i)|) < oo and 77/3(4) = n(/3, t) : — ► R 5e a continuous function of 
(/3,t). Z?e^ne L(y,x,t, (3,v) = g{y,x T (3 + u(t))W(x,t) and E(/3) = 
Eo(L(y,x,t, @,r}p)) . Then 

(a) ^(n- 1 E-Li x,, ii, 6>, «)) E(/3) when \\0 - (3\\ + ||v - ^(U 0, 

(b) sup 0G/c |n _1 E"=i x i>*i>0>%>) - E(L(yi,Xi,ti,0,r) g ))\ -^0, 

(c) sup 0e/c >veni (p) I™" 1 E"=i L(yi,Xi,tu e i v)-E(L(yi,Xi,ti, 6, v))\ ^ 
i/, m addition, T is compact and r\p GC 1 (T). 

Proof, (a) follows from the dominated convergence theorem. The proofs 
of (b) and (c) follow using the continuity of 77/3 and g, Theorem 3 in Chap- 
ter 2 of [20], the compactness of JC and 7ii((3) and analogous arguments to 
those considered in [2]. □ 

Remark A.l. Proposition A.l implies that for any weakly consistent 
estimate r)p of r]p such that swp te q-\(d/dt)rj^(t) — (d /dt)rjp(t)\ and 

su PteT 1*7/3 (*)-»7/3(*)l -^0, we have (1/n) £™ =1 H G/i, x i,^, A ^ E (/3)- 
An analogous result can be obtained replacing by -^-k 

Proof of Theorem 3.2. (a) Define 

1 n 

F n (/3) = - p(yi,*iP + Vp(ti))w 2 (xi). 
n f— f 

For any e > 0, let 7o be a compact set such that P(tj ^ Tq) < e and let 
T n = (1/n) E?=i i %)■ We then have 

sup \F n ((3) — F n (j3)\ < || \ Halloo sup 1 1 57/3 — 77/3 1| 0,oo + 2||p||oo7n f ■ 

Hence, using (16) and the strong law of large numbers, we easily get that 

(A.4) sup|F n (/3)-F n (/3)|^0. 

/3eK 

Moreover, Proposition A. 1(b) with W(x.,t) = w>2(x) and g(y,u) = p{y,u) im- 
plies that sup/3 g £ \F n (/3) — F(f3)\ which, together with (A.4), concludes 
the proof of (a). 

(b) Note that (a) implies that F n 0) = inf^ F n (J3) ^ mfp elc F(j3) = 
F((3 ) and F n 0) - F0) ^ 0, and so F0) ^ F(/3 ). Since F has a 
unique minimum at (3 , (b) follows easily. □ 
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A. 2. Proof of the asymptotic normality of the regression estimates. For 

the sake of simplicity, we denote 



(A.5) 
(A.6) 



d d 2 

X(y,a) = — ^(y,a), \i{y , a) = {y , a) 

v((3, t) = f)0(t) - 77/3 (i), v (t) = v((3 , t), 
dv((3,t) 



v,(/3,i) 



Vj,o(*) = Vj(/3 ,i). 



We list a set of conditions needed for the asymptotic normality theorem, 
followed by general comments on those conditions. The first condition is on 
the preliminary estimate of r]p(t) and the rest are on the score functions and 
the underlying model distributions. 

Nl. (a) The functions fjg(t) and tjr (t) are continuously differentiable with 
respect to ((3,t) and twice continuously differentiable with respect to 
(3 such that {d r]p(t)) / 'dj3jdf3(\p = p Q is bounded. Furthermore, for any 
1 < j,£ < P, (d 2 i]/3(t))/d(3jdl3e satisfies the following equicontinuity con- 
dition: 

Ve>0, 35>0:\I3 1 - (3 \ <S 

d 2 d 2 



■VP 



0=01 



V0 



0=00 



< e. 



( b ) 11% ~Vo\ 



for any consistent estimate of /3 



(c) For each t £ T and (3, v((3,t) — > 0. Moreover, ji 1 / 4 )! 
" 1/4 ||vj,o||oo for all 1 < j < p. 



o- 



vol 



and 



(d) There exists a neighborhood of /3 with closure K, such that for any 

1 < j,e < p, sup^divj-oa, -)||oo + II (dv^p, y/WiU) o. 

(e) \\(dv )/dt\\ oc + ||(dv.,-o)/at||oo ^0 for any l<j<p. 

N2. The functions Vl/, x, xi> ^2 and ^(x) = x!«2(x) are bounded and con- 
tinuous. 

N3. The matrix A is nonsingular, where 



A = Er 



x(y,x T /3 + r/ (t)) 



+ *(y,x T /3 + 77o(t))- 



9 2 



/3=/3o 

T 



.9 



/3=/3 



N4. The matrix S is positive definite with 

S = E (M/ 2 (y,x T /3 + r ?0 (t))«;i(x) 



0=00 



u> 2 (x) 
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0=00 



N5. (a) Eo{*(i/,x T /3 + »a,(t))|(x,t)} = 0. 

(b) E [{ X (y,x T /3 + 7 ?0 (r))(x + (^(r))/9/3| /3=/3o )}^ 2 (x)|i = r] = 0. 
N6. E (u; 2 (x)||x + (9r ?/ 3(r))/9/3| / 3 =/3o || 2 )<cx). 



Remark A. 2. Conditions Nl(a) and (d) imply that for any consistent 
estimator of /3 , we have A n 



and A n — with 



A, 



max 

i<j<P 



max 

i<j,£<P 



d . 


a 






aft"' 3 


P=P OPj 


0=00 


oo 



d 2 



Vp 



o 2 



0=0 



df3jd/3 t 



Vp 



0=00 



Condition Nl(b) follows from the continuity of r)p(t) =rj(/3,t) with respect 
to (/3, t) and Theorem 3.1 that leads to sup^g^ — r/^gHoo —* 0. 

Remark A. 3. When the kernel K is continuously differentiable with 
derivative K' bounded and with bounded variation, the uniform convergence 
required in Nl(b)-(e) can be derived using arguments analogous to those 
considered in Theorem 3.1 by using the facts that 



Of 



{nKY 1 J2 K({t - t l )/h n )x(y t ,^ i P + fj p (t)) 



i=i 



9 . , v 



{nhnY 1 TJU K{{t - ti)/h n )x(y h xJ/3 + fjp(t)) 
and requiring that 

sup£(sup |x(yi ) x[/3 + r? /3 (t))|||xi|| | t x =t) < oo, 



K peK 



sup£( sup \xi(yi,*lP + Vf3(t))\\\ x i\\ I h =t) <oo, 
teT v a ^ 



ME( x (yi,xll3 + vp(t)) \h=t)>0. 
teT 

The uniform convergence rates required in Nl(c) are fulfilled when r)p 
is defined as in (7) and a rate-optimal bandwidth is used for the kernel. 
The convergence requirements in Nl are analogous to those required in con- 
dition (7) in [21], page 510 and are needed in order to obtain the desired 
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rate of convergence for the regression estimates. More precisely, assump- 
tion Nl(c) avoids the bias term and ensures that G n (fjp ) will behave asymp- 
totically as G n (rjf3 ), where for any /3gR p and any differentiable function 
Vf}(t)=v(P,t):: 



n 



G n {vp) = -=^*(y i ,x?P +v l 3 (t i )) 



P=Po 



Remark A. 4. If N4 is fulfilled, then the columns of x+ (drjp^)) /d(3\p=p 
will not be collinear. It is necessary not to allow x to be predicted by t to 
get root-n regression estimates. 

Note that for the functions considered by Bianco and Yohai [5] , Croux 
and Haesbroeck [12] and Cantoni and Ronchetti [9], N5(a) is satisfied. This 
condition is the conditional Fisher consistency property as stated by Kiinsch, 
Stefanski and Carroll [17] for the generalized linear regression model. 

Note also that N5(b) is fulfilled if W2 = wi. Effectively, since 77/3(1") mini- 
mizes S(a,P,r) for each r, it satisfies 

Eo[^(y,x T /3 + 7 ?/3 (r))u;i(x) \t = - 

thus, differentiating with respect to (3, we get 



0, 



En 



d 

X(y, x T /3 + (t)) ( x + — r]p{T\ 



UMX) 



0. 



Moreover, if either W2 = 
A = E (x(y,x T /3 + ? ? oW) 



Wi or N5(a) holds, then 
8 







W 2 (X, 



Therefore, if ^(y, u) is strictly monotone in u and P(ii>2(x) > 0) = 1, then N3 
holds, that is, A will be nonsingular unless P(a T [x + {dr]p(t)) /df3\p=p ] = 0) = 
for some a£P (i.e., unless there is a linear combination of x which can be 
completely determined by t). 

Assumption N6 is used to ensure the consistency of the estimates of A 
based on preliminary estimates of the regression parameter (3 and of the 
functions r]R. 

Lemma A.l. Let (yi,Xj,tj) be independent observations such that 
2/i|(xj,tj) ~ with m = H(r]o(ti) + x]f3 ) and VAR.(?/i|(xj,ij)) = V(m). 

Assume that ti are random variables with distribution on a compact set T 
and that N1-N3 and N6 hold. Let (3 be such that (3^0. Then A n A, 
where A is given in N3, Zj(/3) =Xj + (dfjp(ti)) /d(3\g_p and 



n 
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2 



1=1 



dj3d0 



0=0 



Proof. The proof follows easily using a Taylor expansion, the required 
assumptions, Proposition A.l and the fact that /3 — /3 . Details can be 
found in [81. □ 



Proof of Theorem 4.1. Let /3 be a solution of F^((3) =0 defined 
in (15). Using a Taylor expansion of order one, we get 



= Yl *(f<' x - /3 + Vp (U))w 2 (*i) 
where 

n 9 f 



0=A> 



+ raA n (/3-/3 ), 



n 



d(5 I 

■ 1 X]x(yi,x^ + r) y g(t i )) 



i=l 
n 







i=l 



/3=/3. 



/9=/3 
X, + ^fafe 



13=0. 



X W 2 



(x 4 )+n ^ x^ + ^i)) 



2 



8=1 



/3=/3 



.W 2 (xj 



with /3 an intermediate point between (3 and /3. Note that in the partially 
linear regression model, only the first term in A n is different from since 
f)p(f) is linear in (3. 

From Lemma A.l, we have that A n — > A, where A is defined in N3. 
Therefore, in order to obtain the asymptotic distribution of (3, it will be 
enough to derive the asymptotic behavior of 



L n = n 1 i 2 Y J ^{yu*lPo + mM) 



i=i 



Let 



L n = n 1/2 J2^(yi^JPo + m ^)) 



i=i 



x, + ^Uu 



P=Po 



13=00 



W 2 (Xi). 



Using the fact that rip = rjo and noting that N5 implies E[^(jji, xJ/3 + 
r/ j g (tj))|(xj,ij)] = 0, it follows that L n is asymptotically normally distributed 

with covariance matrix Therefore, it remains to show that L n — L n — > 0. 
We have the expansion L n — L n = L\ + L\ + L\ + L^, where 



L \ = n 1/2 E^fe x ^o + %(i«)) 



x, + ^V0(U 



0=0 a 



W 2 (xi)vo(ti), 
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i=l 
n 

Ll = n- 1 Y / X(yu x^/3 + vo ft ))w 2 (xj ) (n 1 / 4 v ft ) ) (n 1 / 4 ^ ft)) , 
i=l 

w 2 (xi)(n 1/4 z)o(ti)) 2 , 

with v Q (t) = V(3 (t)-r} (t), v (t) = (v 1>0 (t),...,v p ,o(t)) T =a0()9, t)/d(3\ a= a 
defined as in (A. 6), t) defined as in (A. 5) and £ft) an intermediate point 
between fjp (ti) and 770 ft)- ^ is easy to see that — > and L 4 
follow from Nl(c) and N2. To complete the proof, it remains to show that 
JJ n — > for j = 1, 2, which will follow from Nl(c)-(e) and N5 using entropy 
arguments similar to those considered in [3] . Details can be found in [8] . □ 
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